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Evaluating model fit in nonlinear multilevel structural equation models (MSEM) presents a 
challenge as no adequate test statistic is available. Nevertheless, using a product indicator 
approach a likelihood ratio test for linear models is provided which may also be useful for 
nonlinear MSEM. The main problem with nonlinear models is that product variables are 
non-normally distributed. Although robust test statistics have been developed for linear 
SEM to ensure valid results under the condition of non-normality, they have not yet been 
investigated for nonlinear MSEM. In a Monte Carlo study, the performance of the robust 
likelihood ratio test was investigated for models with single-level latent interaction effects 
using the unconstrained product indicator approach. As overall model fit evaluation has a 
potential limitation in detecting the lack of fit at a single level even for linear models, level- 
specific model fit evaluation was also investigated using partially saturated models. Four 
population models were considered: a model with interaction effects at both levels, an 
interaction effect at the within-group level, an interaction effect at the between-group level, 
and a model with no interaction effects at both levels. For these models the number of 
groups, predictor correlation, and model misspecification was varied. The results indicate 
that the robust test statistic performed sufficiently well. Advantages of level-specific 
model fit evaluation for the detection of model misfit are demonstrated. 



Keywords: multilevel structural equation modeling, interaction effect, level-specific model fit, likelihood ratio test, 
robust test statistic 



INTRODUCTION 

Multilevel structural equation modeling (MSEM) has gained 
increasing attention over the last decades, as it combines advan- 
tages of multilevel modeling (MLM) and structural equation 
modeling (SEM) (cf. Muthen, 1994; Mehta and Neale, 2005; 
Hox et al., 2010). MLM has been developed for the analysis of 
clustered data and attempts to partition observed variances and 
covariances into within- and between-group components, while 
SEM aims at modeling the variances and covariances by tak- 
ing the measurement errors into account. With the exception of 
cross-level interactions, MSEM generally incorporates linear rela- 
tionships among latent variables at the within-level and at the 
between-level. 

For the analysis of nonlinear single-level SEM with inter- 
action or quadratic effects in the structural model, sev- 
eral methods have been developed (for an overview see, 
e.g., Schumacker and Marcoulides, 1998; Marsh et al., 2004; 
Klein and Muthen, 2007; Moosbrugger et al., 2009; Brandt 
et al., in press). These approaches include distribution-analytic 
approaches (Klein and Moosbrugger, 2000; Klein and Muthen, 
2007), product indicator approaches (e.g., Joreskog and Yang, 
1996; Marsh et al, 2004; Little et al, 2006; Moosbrugger 
et al, 2009), Bayesian approaches (e.g., Lee et al, 2007; 
Song and Lu, 2010), and method of moment approaches 
(e.g.. Wall and Amemiya, 2003; Mooijaart and Ben tier, 2010; 
Brandt et al, in press). The most often used methods are 
the unconstrained product indicator approach (Marsh et al. 



2004) and the latent moderated structural equations approach 
(LMS; Klein and Moosbrugger, 2000). For the analysis of non- 
linear MSEM only these two approaches have already been 
applied. 

The unconstrained product indicator approach has been 
developed for the estimation of latent interaction effects in 
single-level SEM with robust properties when distributional 
assumptions are violated. Products of indicator variables need 
to be constructed to identif)' the latent product (interaction 
or quadratic) terms. The parameters related to the measure- 
ment model of the latent nonlinear term are freely estimated, an 
advantage compared to the constrained approach which mod- 
els these parameters as nonlinear functions of linear parameters 
(Joreskog and Yang, 1996). For parameter estimation a maximum 
likelihood (ML) method developed for linear models is used 
which assumes multivariate normality of the indicator variables, 
an assumption violated in latent interaction models. Although 
parameter estimators are asymptotically unbiased standard errors 
are known to be generally underestimated (cf Moosbrugger et al., 
2009). A model test that takes the nonnormality induced by 
product indicators into account has not been developed yet, 
but a test statistic for linear models based on the compari- 
son the empirical and the model-implied covariance matrix is 
available. 

LMS is a distribution-analytic approach which does not 
require the forming of product indicators. Instead, LMS exploits 
the specific type of nonnormality implied by latent nonlinear 
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effects for parameter estimation by using conditional distribu- 
tions to represent the nonlinearity in the model (cf Klein and 
Moosbrugger, 2000; Kelava et al., 2011). The nonnormal density 
function of the joint indicator vector is approximated by a finite 
mixture distribution of multivariate normally distributed compo- 
nents. For parameter estimation a ML method is used especially 
tailored for nonlinear SEM. LMS parameter estimators are there- 
fore unbiased and highly efficient. A model test is not yet available 
as an adequate saturated model which in addition to the lin- 
ear relations in the model also takes the nonlinearity induced 
by product terms into account has not been defined yet. A x.^ 
difference test based on likelihood values is provided for testing 
the significance of single model parameters. The power to detect 
nonlinear effects is higher for LMS than for the unconstrained 
approach. 

Only recently researchers have started to investigate level- 
specific nonlinear effects (i.e., interaction or quadratic effects) in 
MSEM (Marsh et al, 2009; Leite and Zuo, 20 11; Nagengast et al, 
2013). Using the unconstrained approach, Nagengast et al. (2013) 
tested the expectancy-value model of motivation in a nonlinear 
MSEM and found a significant latent interaction effect between 
homework expectancy and homework value in predicting home- 
work engagement at the within-group (student) level. Using 
LMS, Marsh et al. (2009) extended the tests of the big-fish-little- 
pond effect by investigating a latent quadratic effect of students' 
individual achievement and a latent interaction between gender 
and achievement on academic self-concept at the within-group 
level. However, these nonlinear effects did not reach statistical 
significance. 

Up to now only a single simulation study for nonlinear MSEM 
exists using the unconstrained approach (Leite and Zuo, 20 11). In 
this study, two types of mean centering, i.e., grand-mean center- 
ing (cf Marsh et al, 2009) and residual centering (cf Little et al., 
2006), were applied for the analysis of a nonlinear MSEM with a 
single latent interaction effect at the between-group level. Results 
showed that both types of mean centering performed equally well 
for detecting the interaction effect when product indicators were 
highly reliable, while mean centering tended to perform slightly 
better for less reliable product indicators. 

These few studies already indicate that single nonlinear effects 
can be detected using both approaches. However, researchers are 
generally interested in the overall fit of the nonlinear MSEM and 
not only in the significance of a single parameter. Unfortunately, 
the model fit cannot be determined as no adequate test statistic 
is provided by either of the nonlinear approaches. Researchers 
therefore investigate the model fit of a linear MSEM using the 
test before including the product term in the model (cf Nagengast 
et al., 2013), although this practice is questionable because the 
assumptions of multivariate normality and homoscedastic resid- 
uals are violated for the linear model if there are nonlinear effects 
in the population model. 

Evaluating the fit of nonlinear MSEM therefore presents a 
challenge. Although LMS does not provide any model test, the 
unconstrained product indicator approach nevertheless provides 
the likelihood ratio test developed for linear models. This test 
is based on the comparison of the unstructured and the model- 
implied covariance matrix with product terms included in the 



matrices as if they were observed variables. As the product vari- 
ables are always nonnormally distributed, the overall model test 
does not foUow a central x.^ distribution. 

However, most statistical programs provide a robust test statis- 
tic corrected for nonnormality in the data (cf Bentler and 
Dijkstra, 1985; Satorra and Bentler, 1994; Yuan and Bentler, 
1998). Although the robust test statistic has originally been devel- 
oped to correct the inflated test statistic due to unwanted non- 
normality in the data, it may nevertheless correct the test statistic 
sufficiently well due to nonnormality resulting from products of 
normally distributed variables. In our simulation study the robust 
test statistic will therefore be used as if the normality assumption 
were just violated because of a multivariate nonnormality result- 
ing from, e.g., floor or ceiling effects, rather than from the specific 
type of nonnormality implied by latent interactions. 

The main goal of this study is to investigate the performance 
of the robust test statistic compared to the uncorrected ML test 
statistic for nonlinear MSEM using the unconstrained approach. 
In a Monte Carlo study we will investigate whether the robust 
test statistics is able to reliably detect misspecification of a nonlin- 
ear MSEM at the within-group level, at the between-group level, 
and at both levels simultaneously. Cross-level interaction, which 
occurs when the random slope of a within-group variable is pre- 
dicted by a between-group level, wiU not be considered in this 
study, because this type of nonlinear effect poses a particular chal- 
lenge for model fit evaluation. As level-specific model evaluation 
has been shown to be more informative for the detection at which 
level the misfit occurs (cf Ryu and West, 2009; Ryu, 201 1), we will 
also investigate the level-specific model fit. 

NONLINEAR MULTILEVEL SEM 

In the following, we will use the unconstrained product indicator 
approach for the analysis of a nonlinear MSEM with interaction 
effects at both levels. This approach needs the forming of product 
variables as indicators of the latent interaction terms. The non- 
linear MSEM contains a covariance structure at each level, and 
components are needed in order to fit the model at the between 
and the within level. 

If data are collected from N individuals (;' = 1,. . . , N) nested 
in / groups (;' = 1,. . . , /), the data vector yij of subject i in group 
(cluster) j is decomposed into the sum of the group (cluster) aver- 
age component ygj plus the individual deviations from the group 
average ywij- 

y,] = yB]+yw,i (i) 

where the unobserved random components yBj and ywij are 
assumed to be independent with expected values Eiy^j) = jjl and 
Eiy^^-j) = 0 (cf Muthen, 1994; Yuan and Bentler, 2007). 

The measurement models for the endogenous random com- 
ponent vectors andyvw; are 

ygj = H. + Al^Bj + SB], yw,, = ^w'^wt] + ew,j (2) 

where [Ji is a mean vector, and are the factor loading 
matrices, y\B and y\w are the latent criterion (dependent) vari- 
ables, and bb and ew denote the residual vectors at the between 
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and the within level. Analogously, the data vector Xy is also 
decomposed into two unobserved random component vectors, 
XBj and xwij 

Xij = XBj + xwij (3) 

where the vectors XBj and xwij are assumed to be independent 
with expected values E{xBj) = v and E(xwij) = 0. 

The measurement models for the exogenous random compo- 
nent vectors XBj and xwij are 

XBj = v + Ag^Bj + ^Bj, xwij = A'^^^r-j + 8wj; (4) 

where v is a mean vector, and A^ are the factor loading 
matrices, and ^ are the vectors of latent predictor and moder- 
ator (independent) variables, and 8b and & w denote the residual 
vectors at the between and the within level. 

The unconstrained approach requires the forming of prod- 
uct indicators for defining the latent interaction terms. Although 
several alternative strategies exist for the construction of these 
indicators, most often used are the all-pair and the matched- 
pair strategies. Kenny and Judd (1984) as well as Joreskog and 
Yang (1996) used the all-pair strategy for creating all possible 
cross-products to define the latent interaction term, while using 
the matched-pair strategy Marsh et al. (2004) showed that it is 
sufficient to use each indicator of the latent predictor and the 
moderator variable only once in forming the cross-products. As 
cross-products can be created by using different combinations of 
the indicators. Marsh et al. (2004) suggested matching the indi- 
cators by reliability as the interaction effects were found to be 
estimated with more precision when the indicators with the high- 
est factor loadings were matched to form the cross-products. The 
matched-pair strategy requires the number of indicators of the 
latent predictor and the latent moderator variable to be the same 
(for strategies using unequal numbers of indicators, cf. Jackman 
et al, 2011 and Wu et al, 2013). 

When the structural models include predictor variable and 
moderator variable ^b the between level and predictor and 
moderator variables ^ j and ?2 w the within level, the measure- 
ment models for the latent interaction terms ?ib?2B ^iw^iw 
are 

XkBjXlBj = Tb -I- A^ i^iBj hsj + SB;-' 
XkWijXlWij = tW + ^nvy ?2Wy + Zwij (5) 

where tb and xw are mean vectors, A^ and A^ denote the fac- 
tor loading matrices, XkBjXiBj are vectors of cross-products with 
k = 1,..., K random components as indicators of and / = 
I,. . . , L {K = L) random components as indicator variables of ^2B) 
^kWijXiwij are vectors of cross-products with k = I, . . . , K ran- 
dom components as indicator variables of ^iw and I = l,...,L 
{K = L) random components as indicator variables of ^2^1 and 
5Bj as well as qY/ij denote the residual vector. 

As nonlinear effects may occur at the between-group level, at 
the within-group level, or at both levels simultaneously, the struc- 
tural equations for a model with two latent level-specific predictor 
variables (^iw, ^2w)> two moderator variables (^ib, ^2b). and a 



latent interaction term at both levels are then given by 

r]w = yiW%lW + Y2W^2W + Y3W^1W^2W 

+Kw (within-group level) (6) 

r\B = a + YlB^lB + Y2B^2B + Y3B^1B^2B + 

t,B (between-group level) (7) 

where a is the overall mean, Yiw> Y2W. and Y3W are effects at 
the within level, Yib> Y2B> and 735 are effects at the between level, 
and t,w and ?b are disturbance terms. In applied research, the 
between-group level predictors do not have to match the within- 
group level predictors, but in this study, we will only consider the 
model in Equations (6) and (7) (see also Figure 1). 

Based on the decomposition in Equations (1) and (3) and 
under the assumption of identical covariance structures across 
groups and uncorrelatedness of within-group and between-group 
random components, the total covariance matrix of the 
data vectors y and x is augmented by the cross-products, and 
the augmented total covariance matrix is then the sum of 
the between-group covariance matrix and the within-group 
covariance matrix Y,^ (cf Yuan and Bentler, 2007) 

Coviy, x) = E* = Y.*B + Y.^ (8) 

where the asterisk denotes matrices augmented by product vari- 
ables. The nonlinear MSEM therefore contains a covariance 
structure at each level augmented by matched-pairs of product 
variables, and these level-specific covariance matrices are needed 
for model fit evaluation. 

MODEL FIT EVALUATION OF NONLINEAR MSEM 
OVERALL MODEL FIT EVALUATION 

For nonlinear MSEM, model fit evaluation is not as straightfor- 
ward as it is for linear MSEM. For linear MSEM the standard 
procedure (Ryu and West, 2009) is often used which is based on 
the comparison of the unstructured covariance matrix with the 
model-implied covariance matrix of the entire model comparable 
to single-level SEM. An often used method for parameter esti- 
mation is the ML method. The ML fit function leads to a test 
statistic Tml that is calculated as the product of the minimum of 
the fitting function Fml and {N — 1 ) , where N equals sample size. 
Under the assumptions of a correctly specified model, multivari- 
ate normally distributed variables, and a sufficiently large sample 
size, Tml asymptotically follows a central distribution. A non- 
significant test statistic Tml indicates that the model fits the data. 
The smaller the difference between both covariance matrices is 
the better the model fits the data. 

The main problem with model fit evaluation for nonlinear 
MSEM is well-known from the evaluation of single-level non- 
linear SEM: Model fit cannot be determined because a suitable 
saturated model does not exist (Klein and Schermelleh-Engel, 
2010). For nonlinear SEM as well as for nonlinear MSEM the tar- 
get model is not nested within the saturated model that is repre- 
sented by the unstructured covariance matrix. The unstructured 
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FIGURE 1 I Path diagram of the nonlinear population IVISEM with latent interaction effects at both levels. Product indicators were constructed using the 
matched-pairs strategy. 



covariance matrix is not appropriate for model fit evaluation of 
nonlinear MSEM because covariances do not contain any infor- 
mation about the nonlinearity (e.g., interaction effects) in the 
data. For that reason, the assessment of overall model fit for a 
nonlinear SEM is still an unresolved problem. 

Nevertheless, nonlinearity is contained in the product vari- 
ables that form the measurement models of the latent prod- 
uct terms. The covariances between the product indicators 
and the y-variables are therefore indicative of existing non- 
linear effects. For model fit evaluation the covariance matrix 
can therefore be augmented such that the new total covari- 
ance matrix comprises covariances between y-variables, 
x-variables, and the matched-pairs of cross-products of the 
x-variables. 

For model fit evaluation, the likelihood ratio test of exact fit for 
nonlinear MSEM can then be performed which tests the hypoth- 
esis that both level-specific model-implied augmented covariance 
matrices are equal to their population matrices (cf. Ryu and West, 
2009): 



(9) 



where 6 is the parameter vector. For this omnibus test the ML test 
statistic based on the augmented covariance matrices can then be 



written as 

r^i = Fml [^*w (e) , s| (e)] - Fml [s^ (e.) , (e,)] (lo) 

where 9 denotes the vector of estimated parameters in the tar- 
get model and 6j denotes the vector of estimated parameters in 
the saturated model. Under the assumption of correctly specified 
models at both levels, multivariate normality and a sufficiently 
large number of groups, the test statistic follows a central 
distribution with dfr = dfs + dfw degrees of freedom. 

Unfortunately, augmenting the empirical covariance matrix 
by product variables implies a multivariate nonnormal distribu- 
tion. The reason is that products even of normally distributed 
variables are nonnormally distributed, i.e., highly kurtotic and 
often skewed (cf Craig, 1936; Aroian, 1944; Moosbrugger et al., 
1997; Klein and Moosbrugger, 2000). Therefore the assumption 
of multivariate normality of the ML estimation method is always 
violated when product terms are added to a structural equation 
model. 

Depending on the strategy used for the construction of prod- 
uct terms, i.e., all-pair or matched-pair strategy, the amount of 
nonnormality in the data set differs. For example, if each latent 
predictor variable is measured by three indicators, for the all- 
pair strategy nine product terms have to be created, while for the 
matched-pair strategy only three product terms are needed. The 
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all-pair strategy therefore produces a larger amount of nonnor- 
mality than the matched-pair strategy. If the covariance matrices 
are augmented by product variables due to the matched-pair 
strategy the amount of nonlinearity is kept to a minimum. 
Therefore the matched-pair strategy is used for the simulation 
study. 

For model fit evaluation of linear MSEM it is recommended 
to use the rescaled test statistic of the ML estimator Tmlr vvhen 
the normality assumption is violated (c£ Marsh et al., 2009; Hox, 
2010; Kim et al., 2012). Tmlr adjusts the unsealed test statistic 
Tml downward as a function of the multivariate kurtosis and may 
therefore correct the nonnormality due to highly kurtotic prod- 
uct variables sufficiently well. Robust test statistics are provided 
by several computer programs. In Mplus, the robust test statistic 
Tmlr is provided which is asymptotically equivalent to the Yuan- 
Bentler T2* test statistic (Muthen and Mutlien, 1998-2012; see 
also Satorra and Bentler, 1994). In the simulation study we wUl 
use the rescaled test statistic based on the augmented covariance 
matrix T^lr compare it to the uncorrected test statistic T^^. 

LEVEL-SPECIFIC MODEL FIT EVALUATION 

The standard approach for multilevel models evaluates the model 
fit for the entire model. However, this approach has some limi- 
tations (cf Yuan and Bentler, 2007; Ryu and West, 2009). If both 
levels are evaluated simultaneously, a significant test statistic does 
not provide any information on the level at which the model is 
misspecified. Model misfit can exist at the between-group level, 
the within-group level or at both levels simultaneously. As sam- 
ple size is typically much larger at the within-level than at the 
between-level, a much heavier weight is given to the within-group 
model fit than to the between-group model fit for calculating the 
overall fit statistic. 

In order to deal with these problems two approaches exist. 
Yuan and Bentler (2007) proposed to use a segregating approach 
which fits the structural equation model at each level separately. 
They showed that model misfit can be detected satisfactorily and 
that the fit indices of single-level SEM can be extended to evaluat- 
ing models at separate levels of a multilevel model. Ryu and West 
(2009, based on Hox, 2002) suggested to estimate partially satu- 
rated models. Model fit for one level is evaluated while the other 
level is specified as a saturated model. This approach showed 
quite similar results compared to Yuan and Bentler's (2007) seg- 
regating approach for the within-group model, but seemed to 
perform better with regard to a slightly lower non-convergence 
rate, a mean chi-square statistic closer to the nominal value, and a 
smaller Type I error rate for estimating the correct between-group 
model. 

For evaluating the model fit of a nonlinear MSEM at the 
within-group level using the partially saturated approach, the 
within model is specified as the target model and the between 
model is specified as saturated. The test statistic for the partially 
saturated model is then 

T*ps w = Pml [^*w (e) , s* (§,)] - Fml [^*w (e.) . s* (e,)] . 

(11) 

Any misfit at the within-group level is due to the discrepancy 
between E;^,(e) and E;^,(e,). 



For evaluating the model fit of a nonlinear MSEM at the 
between-group level, the between model is specified as the tar- 
get model and the within model as saturated. The test statistic is 
then obtained by 

T*ps s = Fml [^*w (k) ^ ^ (e)] - Fml [^*w (k) > ^ (k)] 

(12) 

Any misfit at the between-group level is due to the discrepancy 
between E^(e) and Y,^(Qs)- 

The degrees of freedom at both levels are calculated compa- 
rable to MSEM with linear effects as the difference between the 
number of parameters in the saturated model and the number 
of parameters in the target model. In addition to evaluating the 
complete model we will also evaluate partially saturated models 
in the simulation study. 

METHODS 

We conducted a Monte Carlo study with the aim of investigat- 
ing the performance of the robust test statistic Tmlr compared 
to Tml for nonlinear MSEM. As these test statistics are often also 
denoted as tests, we will use the terms test and robust 
test in the following. The model used for this study is a nonlin- 
ear MSEM with interaction effects at both levels (see Figure 1, see 
also Equations 6 and 7). 

Using latent aggregation to account for sampling error 
the manifest indicators of the latent variables were split into 
their latent within and between components (see Figure 1). 
Latent aggregation is the default option in Mplus for treating 
within and between components as latent unobserved covariates 
(Asparouhov and Muthen, 2007). Using the FSCORES option 
in Mplus the estimated values of the latent components at the 
between-group level, xsj, were obtained from random intercept 
models. In the next step, the estimated values of the latent com- 
ponents at the within-group level, xwij, were calculated by simple 
subtraction. The vectors xwij and xbj can be regarded as latent 
within and between components of the manifest indicator vari- 
ables Xij of the latent predictor variables. Finally, products of 
the within and between components, Xiw X4w, ■■■> ^3B^6B (see 
Figure 1), were calculated using the matched-pair strategy. 

Data for four population models (M) with different numbers 
of nonlinear effects were generated: ( 1 ) a model with linear effects 
at the within-group level (W) and the between-group level (B), 
but no nonlinear effects (0) at either levels (M_W0B0), (2) a 
model with an interaction effect (I) only at the within-group level 
(M_WIB0), (3) a model with an interaction effect only at the 
between-group level (M_WOBI), and (4) a model with interaction 
effects at both levels (M_WIBI). 

Depending on the model, population parameters Yaw (within- 
group level) and (between-group level) were set to either 0 or 
to 0.20. The indicators each had a reliability of 0.80. This resulted 
in factor loadings of 1.00 for the scaling variables and 0.894 for 
all other indicators. Accordingly, error variances were 0.25 for 
the scaling variables and 0.20 for all other indicators. The linear 
effects Yiw, Y2W> and yi^, Y2b were set to 0.30. The population 
mean of tib was set to zero by selecting the intercept a accord- 
ingly. The variances of the latent dependent variables ■(\w and r\B 
were set to 1.0, and the variances of the latent residuals t,w and t,B 
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were selected accordingly with values between 0.82 (model with 
no interaction effects) and 0.72 (model with interaction effects 
and correlated predictors). Since population values at the within- 
and between-group level were set equal, the intra class correlation 
coefficient was 0.50 for all manifest variables. These parameter 
values were held constant across all simulations. 

The latent predictor variables at the between-group level and 
the within-group level, %ib, ^2B. and ^iw, ^2W. as well as all resid- 
ual variables were generated as multivariate normally distributed 
variables. 

SIMULATION CONDITIONS 

In the simulation study, three factors were varied: number 
of groups (three levels: NG = 200, 500, 1000), correlation of 
latent exogenous variables (two levels: <^2i = Corr{%iw, ^2iv) = 
Corr(i,iB, ^2b) = 0, 0.30), and the number of nonlinear effects 
in the complete population model (four levels: no interaction 
effects, within-group interaction effect, between-group interac- 
tion effect, and interaction effects at both levels). The total 
number of conditions was therefore 3 x 2 x 4 = 24. The num- 
bers of groups were selected to ensure convergence. Prestudies, 
not reported here, indicated estimation problems using less than 
200 groups. The number of subjects (NS) in each group was set 
to 30 to achieve a balanced design. Fixing the sample size of each 
group at N = 30 yielded total sample sizes of 6000, 15,000, and 
30,000 subjects, respectively. The value for the latent predictor 
covariance (^21 was either 0 or 0.30. The amount of explained vari- 
ance of the endogenous latent variables varied for the population 
models between 18 and 28%. 

For each condition 500 datasets were generated using the 
statistical software R (R Core Team, 2013), and each dataset 
was analyzed using the program Mplus, version 7 (Muthen and 
Muthen, 1998-2012). 

ANALYSIS MODELS 

Using the ML and the MLR estimation method, overall model 
fit was evaluated for complete multilevel models and for partially 
saturated models (see Table 1). Model fit of partially saturated 
models was evaluated level-specific by saturating one level and 
analyzing the other level, while the model fit for both levels was 
estimated by simultaneously analyzing complete models not sat- 
urated at any level. Model misspecification at one level or at both 
levels was either established by fixing the (existing) interaction 
effect to zero while keeping the latent interaction term in the 
structural equation, or by including the (nonexistent) interaction 
effect in the model. 

As the misspecified models and the correctly specified mod- 
els are nested, it was also possible to conduct x.^ difference tests 
for the evaluation of single nonlinear effects. While the overall 
statistic tests all restrictions in the model simultaneously, the 
difference statistic only tests the significance of single parame- 
ters. The model difference test is generally preferred to the f-test, 
as standard errors of the f-test are known to be biased when 
the assumption of multivariate normality is violated. In order to 
determine the power of the difference tests as well as their 
Type I error rates, the unsealed difference value and not the 
scaled difference value proposed by Satorra and Bentler (200 1 ) 



Table 1 | Overview over analysis models used for overall model fit 
evaluation by means of x2 tests and for evaluation of single 
interaction effects by means of x2 difference tests for complete 
MSEM models and for partially saturated models (PS) at the 
within-group (Ws) or at the between-group (Bs) level. 



Population model Type 1 error 


Power 




X2TEST ^^^BMI^^^^H ^^^H 


M_WIBI WIBI 


WOB\ 


PS_WIBs 


wieo 


PS_WsBI 


PS_WOBs 




PSJMsBO 




M_WOBO WOBO vs. WIBI 




PS_WOBs vs. PS_IA//Bs 




PS_WsBO vs. PS_\NsBI 




M_WIBO 


WOBO vs. WIBO 




PS_WOBs vs. PS_WIBs 


M_WOBI 


WOBO vs. WOBI 




PSJNsBO vs. PS_WsBI 


Population models are denoted by M, analysis models are denoted by their 
respective within- or between-levels (W, B), 1 indicates an interaction effect at the 



within- or between-group level (Wl, Bl), 0 Indicates a missing interaction effect 
(WO, BO), PS are partially saturated analysis models at the within- or between- 
level (Ws, Bs). Misspecified models vi/ith either an interaction effect added to a 
linear model or an existing interaction effect fixed to zero are in italics. 

was used for nested model comparisons as suggested by Gerhard 
et al. (in press) and Cham et al. (2012) for nonlinear SEM. Nested 
model difference tests were performed for complete as well as for 
partially saturated models (cf. Table 1). 

In the following, the analysis models are denoted comparable 
to the population models (see Table 1) but without the "M" at 
the beginning of the name: "W" and "B" again indicate models 
at the within- and the between-group level, "I" indicates that an 
interaction effect is present while "0" indicates that the nonlinear 
effect is fixed to zero. 

There were four different types of analysis models: (1) models 
estimating linear effects at both levels while the interaction effects 
were fixed to zero (WOBO); (2) models estimating an interaction 
effect at the within-group level but no interaction effect at the 
between-group level (WIBO); (3) models estimating an interac- 
tion effect at the between-group level but not at the within-group 
level (WOBI), and (4) models estimating interaction effects at 
both levels (WIBI). Additionally, there were two types of partially 
saturated analysis models: Either the between-group level was 
saturated (Bs) while model fit at the within-group level was eval- 
uated (PS_W0B5, PS_WIBs), or the within-group level was sat- 
urated (W5) while model fit at the between-group level was 
evaluated (PS.WjBO, PS.W^BI). 

EVALUATION OF THE MODEL TESTS 

For all types of analysis models the means of the standard 
X^ values and the means of the robust x^ values for estimat- 
ing overall model fit were obtained from 500 replications for 
each condition, and the rejection rates at the nominal level of 
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a = 0.05 were computed. The rejection rates for linear mod- 
els with interaction effects additionally included in the model 
can be interpreted as the Type I error of the test, the rejec- 
tion rates for misspecified models with interaction effects fixed 
to zero can be interpreted as the power of the test to 
detect misspecification. Analogously, means of difference val- 
ues. Type I error rates, and power for x^ difference tests were 
obtained. 

RESULTS 

In the following, we will only report a representative selection 
of the different analyses (see Table 1) because the results not 
reported here lead to similar conclusions. No non-convergent 
or inadmissible solutions (e.g., negative variance estimates) were 
encountered across all simulated data sets. First, mean x^ values 
and Type I error rates for the overall model test by comparing 
the ML and MLR estimators are given in Table 2. As MLR out- 
performed ML in all conditions, only MLR results are reported 
in the subsequent Tables. Power rates for misspecified models 
at the within-group level and at the between-group level are 
given in Table 3. Second, results of the x^ difference tests include 
Type I error rates (Table 4), power rates at the within-group level 
(Table 5), and power rates at the between-group level (Table 6). 

OVERALL MODEL FIT 

MLR mean x^ values and Type I error rates for the popula- 
tion model M_WIBI are given in Table 2. For the ML estimator 
Type I error rates were inflated across all conditions, indepen- 
dent of the level being analyzed. The difference between observed 
mean x^ values and degrees of freedom was higher for the 
complete model (WIBI) than for the partially saturated models 
(PS.WIBj, PS_WsBI). Type I error rates using the MLR estimator 
for the population model with interaction effects at both levels 
(M_WIBI) were close to the nominal a level with values ranging 
between 2.6 and 7.8%. Results also indicated that the MLR esti- 
mator showed a slightly more conservative behavior for higher 



numbers of groups (NG = 500, 1000) when the models partially 
saturated at the within-group level were analyzed. 

MLR mean x^ values and power rates for the population 
model M_WIBI are listed in Table 3. The results show that model 
misspecifications at the within-group level could be reliably 
detected. When the within-group interaction effect in the anal- 
ysis models was fixed to zero, high x^ values indicated significant 
model misfit, and the rejection rate was 100% across conditions. 
Mean x^ values ranged from 278 to 1117 for the analyses of the 
complete model with misspecification at the within-group level 
( WOBI) with higher values in conditions with a larger number of 
groups. The x^/d/-ratio was larger for partially saturated models 
{PS_WOBs) than for the unsaturated models. 

Model misspecification at the between-group level with the 
interaction effect fixed to zero was less reliably detected (see 
Table 3). The rejection rates ranged from 12 to 58% for the com- 
plete model (WIBO) and from 13 to 77% for the partially satu- 
rated model (PS_WsBO). Power rates were in all conditions higher 
in the partially saturated models than in the complete models 
and increased in conditions with higher numbers of groups and 
models, especially for models with correlated predictor variables. 

For conditions with correlated predictor variables mean x^ 
values and power rates were always larger than for conditions 
with uncorrected predictors. However, these differences were 
relatively small. 

X^ DIFFERENCE TESTS 

In order to investigate the behavior of the model difference test 
for detecting single interaction effects at one level or at both lev- 
els simultaneously, several model comparisons were performed. 
MLR mean x^ difference values and Type I error rates for the 
comparison of the correctly specified model without interaction 
effects at both levels (M_WOBO) with misspecified models which 
additionally included an interaction effect either at both levels 
simultaneously, or at the within-group level or the between-group 
level only, are listed in Table 4. The results show that Type I error 



Table 2 | ML and MLR mean x2 values of overall model fit and Type I error rates for the population model with interaction effects at both levels 
(M_WIBI) analyzed with the correct complete model (WIBI), the correct partially saturated within model (PS_WIBs), and the correct partially 
saturated between model (PS_WsBI) under conditions of varying numbers of groups {NG) and uncorrelated predictor variables (4>2i = 0). 

Population model: M_WIBI 

Analysis models 

WIBI PS_WIBs PS_WsBI 

NG i} df Type I error (%) df Type I error (%) df Type I error (%) 



200 113.59 100 25.6 56.74 50 18.2 56.87 50 17.4 

500 113.27 100 26.6 56.63 50 17.2 56.65 50 18.2 

1000 111.65 100 20.4 55.75 50 15.0 55.89 50 15.6 

MLR ^^^^^^^^^^^^^^^^^^^^Bl^l^^^^^^^^^^^^H^^^^^^^^^^^^H 

200 99.65 100 6.0 50.72 50 7.8 49.03 50 5.6 

500 97.30 100 3.6 50.48 50 7.2 46.97 50 3.4 

1000 95.20 100 2.8 49.65 50 4.2 45.75 50 2.6 

is the mean of the Monte Carlo values. 
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Table 3 | MLR mean ^2 values of overall model fit and power rates for the population model with interaction effects at both levels (M_WIBI) 
analyzed with complete models misspecified at the within level (IVOBI) or at the between level (WI60), and analyzed with partially saturated 
models with misspecification at the within level (PS_lVOBs) or the between level (PS_WsSO) under conditions of varying numbers of groups 
(NG) and correlation of predictor variables ($21 )■ 



Population model: M_WIBI 









WOB\ 






PS_lVOBs 




NG 




df 


Power {%) 




df 


Power (%) 






200 


278.53 


101 


100 


233.32 


51 


100 


500 


534.43 


101 


100 


504.91 


51 


100 


1000 


966.16 


101 


100 


960.51 


51 


100 


«2i = 0.30 














200 


314.09 


101 


100 


268.44 


51 


100 


500 


613.68 


101 


100 


588.01 


51 


100 


1000 


1117.46 


101 


100 


1118.05 


51 


100 








WIBO 






PSJNsBO 




NG 




df 


Power (%) 




df 


Power (%) 






200 


106.69 


101 


12.0 


55.98 


51 


13.4 


500 


112.66 


101 


21.6 


61.87 


51 


28.8 


1000 


125.29 


101 


49.4 


74.72 


51 


66.6 




200 


108.39 


101 


13.2 


58.03 


51 


16.0 


500 


113.69 


101 


20.8 


63.68 


51 


30.4 


1000 


130.93 


101 


57.8 


80.45 


51 


77.0 



is the mean of the Monte Carlo values. Misspecified models with either an interaction effect added to a linear model or an existing interaction effect fixed to 
zero are in italics. 



Table 4 | MLR mean x2 difference values (Ax^) and Type I error rates for comparing correctly specified complete or partially saturated models 
without nonlinear effects (WOBO, WOBs, WsBO) with misspecified models with an added interaction effect at both levels, at the within-group 
level (I/I//), or at the between-group level (6/) under conditions of varying numbers of groups (NG) and varying correlation of predictor 
variables ($21). 



Population model: M_WOBO 






WOBO vs. WIBI 


PS_WOBs vs. PS_lV/Bs 


PS_WsBO vs. PS_WsB/ 


NG 




Kdf Type 1 error {%) 


Ax^ Adf Type 1 error (%) 


Ax^ Ad^ Type 1 error (%) 





200 


2.05 


2 


3.4 


1.00 


1 


3.8 


1.05 


1 


3.6 


500 


1.94 


2 


3.4 


0.93 


1 


3.4 


1.01 


1 


3.4 


1000 


2.07 


2 


4.6 


1.05 


1 


4.4 


1.02 


1 


3.6 




200 


2.23 


2 


5.4 


1.05 


1 


5.0 


1.18 


1 


5.6 


500 


2.22 


2 


6.0 


1.11 


1 


6.0 


1.11 


1 


3.8 


1000 


1.99 


2 


3.6 


1.07 


1 


4.6 


0.91 


1 


3.2 



Ax^ is the mean of the Monte Carlo difference values. Misspecified models with either an interaction effect added to a linear model or an existing interaction 
effect fixed to zero are in italics. 
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Table 5 | MLR mean x2 difference values (Ax^) and power rates for comparing misspeclfied complete or partially saturated models with a fixed 
interaction effect at the within-group level [WO] with the correct models (WIBO, PS_WIBs) without nonlinear effects at the between-group 
level under conditions of varying numbers of groups (NG) and varying correlation of predictor variables ($21)- 

Population model: IV!_WIBO 

Misspeclfied Analysis Models at the Within-Group Level 

WOBO vs. WIBO PS_l/l/OBs vs. PS_WIBs 

NG Ax^ \df Power {%) Ax^ \df Power (%) 



P21 = I 

200 
500 
1000 

$21 = 0.30 

200 
500 

1000 



180.22 
439.28 
872.00 



100 
100 
100 



183.61 

455.09 
909.39 



ICQ 
100 
100 



212.27 
518.49 
1026.03 



100 
100 
100 



215.98 
536.94 
1070.55 



100 
100 
100 



Ax^ is the mean of the Monte Carlo ■/} difference values. Misspeclfied models with either an Interaction effect added to a linear model or an existing Interaction 
effect fixed to zero are in Italics. 



Table 6 | x2 difference mean values (Ax^) and power rates for comparing misspeclfied complete or partially saturated models with a fixed 
interaction effect at the between-group level [BO] with the correct models (WIBO, PS_WsBI) without nonlinear effects at the within-group level 
under conditions of varying numbers of groups (NG) and varying correlation of predictor variables ($21)- 

Population model: M_WOBI 

Misspeclfied Analysis Models at the Between-Group Level 

WOBO vs. WOBI PS_WsBO vs. PS_WsBI 

NG Ax^ Adf Power (%) Ax^ Adf Power (%) 



«2J = Q,^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^H 

200 7.04 1 68.0 6.93 1 68.0 

500 15.11 1 96.4 14.63 1 96.0 

1000 29.48 1 100 28.35 1 100 

200 7.95 1 75.4 7.86 1 74.6 

500 18.05 1 98.6 17.51 1 98.6 

1000 35.36 1 100 34.04 1 100 

Ax^ Is the mean of the Monte Carlo difference values. Misspeclfied models with either an Interaction effect added to a linear model or an existing Interaction 
effect fixed to zero are in Italics. 



rates were close to the nominal a level and tended to be a bit 
conservative in conditions with uncorrelated predictor variables. 
Results of partially saturated models did not deviate from the 
results of complete models, and Type I error rates did not depend 
on the number of groups. 

In Tables 5 and 6, MLR mean difference values and 
power rates for population models M_WIBO and M_WOBI 
are listed. The results indicate that mean difference val- 
ues were substantially higher for models with within-group 
misspecification than for models with between-group misspec- 
ification. Additionally, these values were considerably larger 
for increasing numbers of groups but only moderately larger 
for correlated predictor variables. Power of the differ- 
ence test to detect within-group level misspecifications was 



100% in all conditions (Table 5), while power to detect 
between-group level misspecifications ranged from 68 to 100% 
(Table 6). 

DISCUSSION 

In this study we investigated the overall model fit of MLR com- 
pared to ML for nonlinear MSEM with interaction effects at a 
single level or at both levels simultaneously. We also investigated 
difference tests for detecting single interaction effects. The core 
findings are: 

( 1 ) MLR corrected the overall test statistic sufficiently well, while 
ML always yielded inflated x^ values. Therefore only MLR 
results were reported. 
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(2) For properly specified models, Type-I error rates of the x.^ 
test were close to their nominal a-levels. 

(3) Misspecification at the within-group level was reliably 
detected using the test, while the power to detect misspec- 
ification at the between-group level was fairly low. 

(4) The MLR difference test performed generally fairly well 
with regard to Type I error rates and power although for the 
smallest number of groups [N = 200) power of this test was 
low when models at the between-group level were analyzed 
compared to the within-group level. 

(5) Correlated predictors had a negligible effect such that the 
power to detect model misspecification slightly increased for 
both types of x.^ tests compared to models with uncorrected 
predictors. 

Although an adequate overall model test for nonlinear MSEM 
is not yet available, the likelihood ratio test based on covari- 
ance matrices augmented by product terms performed quite well. 
Using the robust test statistic Tmlr of the Mplus program, non- 
normality resulting from nonlinearity in the model was corrected 
sufficiently well while Tmi, which assumes multivariate normal- 
ity, should not be used for model fit evaluation of nonlinear 
MSEM. 

Compared to previous research (Yuan and Bentler, 2007; Ryu 
and West, 2009) the partially saturated approach was more infor- 
mative than the standard approach. When model fit evaluation of 
the entire model indicated a poor fitting model, only level-specific 
evaluation was able to identify the specific level at which the mis- 
fit occurred. Power to detect misfit at the between-group level was 
quite low comparable to previous research (Ryu and West, 2009). 
Group sizes of NG = 200 did not seem to be sufficiently large to 
detect model misfit reliably, and even NG = 1000 resulted in low 
power for the standard approach (58%) and a power of 77% for 
the partially saturated approach for correlated latent exogenous 
variables. Power to detect misfit at the within-group level was 
always larger than power at the between-group level. This result 
could be expected because the total sample size was used for the 
analyses at the within-group level resulting in sample sizes up to 
30,000 subjects. 

Misspecified models were specified by fixing the nonlinear 
effects to zero while keeping the product indicators in the model. 
This type of misspecification is necessary for testing the signifi- 
cance of single nonlinear effects using a x^ difference test, a test 
often used because it is generally more reliable than the f-test. In 
our simulation study Type I error rates of the overall test as 
well as the difference test were close to the nominal a level (see 
also Gerhard et al., in press). Power to detect misspecification of a 
single nonlinear effect was again larger at the within-group level 
than at the between-group level for both x^ tests mirroring results 
of the partially saturated approach for ML-CFA (cf Ryu and West, 
2009). 

As with all simulation studies there are some limitations which 
we would like to note. First, this study only considered a bal- 
anced design with within-group sample sizes held constant across 
groups. Additionally, the numbers of groups were large and the 
model structure and parameter values were identical at both lev- 
els. Further studies should investigate other designs which may 



be more appropriate for empirical research. Second, for the con- 
struction of the product indicators the matched-pair strategy was 
applied (Marsh et al., 2004) which uses each indicator of the 
latent exogenous constructs only once in specifying the cross- 
products. Alternatively, the all-pair strategy originally introduced 
by Kenny and ludd (1984) could have been applied which uses 
all possible cross-products. Using all possible products of indica- 
tor variables may be especially useful when the reliability of the 
indicator variables differs or when the number of indicators of 
the latent predictor and moderator variable are unequal. In our 
example this would have resulted in nine instead of in three prod- 
uct indicators measuring each latent interaction term. Whether 
this amount of nonnormality introduced by the all-pair approach 
could be also corrected by MLR remains to be investigated in 
a later study. Third, the indicator variables were generated with 
zero means. Because we only used balanced designs, the grand 
mean was identical to the mean of the clusters and therefore 
multicoUinearity could be reduced at both levels. As in applied 
research balanced designs are not to be expected, more research 
is needed in order to investigate the consequences of using dif- 
ferent methods for centering variables in the context of nonlinear 
models. 

In conclusion, the robust ML estimator performed quite 
well in reliably detecting misspecification of nonlinear MSEM. 
Although the results of our simulation study indicate that MLR 
corrects the test statistic sufficiently well especially at the within- 
group level when the unconstrained product indicator approach 
is used, further research is necessary in order to develop a model 
test which takes the specific type of nonnormality implied by 
latent nonlinear effects explicitly into account. 
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